<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<HTML LANG="EN">
<HEAD>
<TITLE>VT100.net: Digital VT220 Programmer Reference Manual</TITLE>
<META NAME="DESCRIPTION" CONTENT="Digital VT220 Programmer Reference Manual, Chapter 2: Character Encoding">
<LINK HREF="vt220-rm.css" TYPE="text/css" REL="STYLESHEET">
</HEAD>
<BODY>
<DIV CLASS="navbar"><A HREF="http://vt100.net/"><IMG CLASS="button" SRC="vt100.net-logo.png" ALT="VT100.net" HEIGHT="16" WIDTH="102"></A> VT220 Programmer Reference Manual<TABLE WIDTH="100%">
<COL SPAN="3" WIDTH="33%">
<TBODY>
<TR>
<TD ALIGN="LEFT"><A HREF="chapter1.html">Chapter 1</A></TD>
<TD ALIGN="CENTER"><A HREF="contents.html">Contents</A></TD>
<TD ALIGN="RIGHT"><A HREF="chapter3.html">Chapter 3</A></TD>
</TR>
</TBODY>
</TABLE>
<HR></DIV>
<H1 ID="S2">2 Character Encoding</H1>
<H2 ID="S2.1">2.1 General</H2>
<P>This chapter describes the character encoding concepts for the
VT220 when operating in text mode. The chapter also describes the
VT220 character sets and provides an overview of the control
functions. You must have a basic understanding of the coding
concepts described in this chapter before using the control
functions described in Chapters <A TITLE="Transmitted Codes" HREF="chapter3.html">3</A> and <A TITLE="Received Codes" HREF="chapter4.html">4</A>.</P>
<H2 ID="S2.2">2.2 Coding Standards</H2>
<P>The VT220 uses an 8-bit character encoding scheme and a 7-bit code
extension technique that are compatible with the following ANSI
and ISO standards. ANSI (American National Standards Institute)
and ISO (International Organization for Standardization) specify
the current standards for character encoding used in the
communications industry.</P>
<TABLE WIDTH="100%">
<THEAD>
<TR>
<TH ALIGN="LEFT">Standard</TH>
<TH ALIGN="LEFT">Description</TH>
</TR>
</THEAD>
<TBODY>
<TR VALIGN="TOP">
<TD>ANSI X3.4 &#8211; 1977</TD>
<TD>American Standard Code for Information
Interchange (ASCII)</TD>
</TR>
<TR VALIGN="TOP">
<TD>ISO 646 &#8211; 1977</TD>
<TD>7-Bit Coded Character Set for
Information Processing Interchange</TD>
</TR>
<TR VALIGN="TOP">
<TD>ANSI X3.41 &#8211; 1974</TD>
<TD>Code Extension Techniques for Use with
the 7-Bit Coded Character Set of
American National Code Information
Interchange</TD>
</TR>
<TR VALIGN="TOP">
<TD>ISO Draft International Standard 2022.2</TD>
<TD>7-Bit and 8-Bit Coded Character Sets &#8211; Code Extension Techniques</TD>
</TR>
<TR VALIGN="TOP">
<TD>ANSI X3.32 &#8211; 1973</TD>
<TD>Graphic Representation of the Control Characters of American National Code for Information Interchange</TD>
</TR>
<TR VALIGN="TOP">
<TD>ANSI X3.64 &#8211; 1979</TD>
<TD>Additional Controls for Use with
American National Standard for Information Interchange</TD>
</TR>
<TR VALIGN="TOP">
<TD>ISO Draft International Standard 6429.2</TD>
<TD>Additional Control Functions for Character Imaging Devices</TD>
</TR>
</TBODY>
</TABLE>
<H2 ID="S2.3">2.3 Code Table</H2>
<P>A code table is a convenient way to represent 7-bit and 8-bit
characters, because you can see groupings of characters and their
relative codes clearly.</P>
<H3 ID="S2.3.1">2.3.1 7-Bit ASCII Code Table</H3>
<P><A TITLE="7-Bit ASCII Code Table" HREF="table2-1.html">Table 2-1</A> is the 7-bit ASCII code table. There are 128 positions
corresponding to 128 character codes, arranged in a matrix of 8
columns and 16 rows.</P>
<P>Each row represents a possible value of the four least significant
bits of a 7-bit code (<A TITLE="7-Bit Code" HREF="figure2-1.html">Figure 2-1</A>). Each column represents a
possible value of the three most significant bits.</P>
<P><A TITLE="7-Bit ASCII Code Table" HREF="table2-1.html">Table 2-1</A> shows the octal, decimal, and hexadecimal code for each
ASCII character. You can also represent any character by its
position in the table. For example, the character H (column 4, row
8) can be represented as 4/8. This column/row notation is used to
represent characters and codes throughout this manual. For
example:</P>
<PRE>1/11 2/3 3/6
ESC   #   6</PRE>
<P>means that the ESC character is at column 1, row 11; the character
# is at column 2, row 3; and the character 6 is at column 3, row
6.</P>
<P>The VT220 processes received characters based on two character
types defined by ANSI, graphic characters and control characters.</P>
<P><DFN><STRONG>Graphic characters</STRONG></DFN> are characters you can display on a video
screen. The ASCII graphic characters are in positions 2/1 through
7/14 of <A TITLE="7-Bit ASCII Code Table" HREF="table2-1.html">Table 2-1</A>. They include all American and English
alphanumeric characters, plus punctuation marks and various text
symbols. Examples are: C, n, ", !, +, and $ (the English pound
sign is not an ASCII graphic character).</P>
<P><DFN><STRONG>Control characters</STRONG></DFN> are not displayed. They are single-byte codes
that perform specific functions in data communications and text
processing. The ASCII control characters are in positions 0/0
through 1/15 (columns 0 and 1) of <A TITLE="7-Bit ASCII Code Table" HREF="table2-1.html">Table 2-1</A>. The SP character
(space, 2/0) may act as a graphic character or a control
character, depending on the context. DEL (7/15) is always a
control character.</P>
<P>Control character codes and functions are standardized by ANSI.
Examples of ASCII control characters with their ANSI-standard
mnemonics are: CR (carriage return), FF (form
feed), CAN (cancel).</P>
<H3 ID="S2.3.2">2.3.2 8-Bit Code Table</H3>
<P>In general, the conventions for 7-bit character encoding also
apply to 8-bit character encoding for the VT220. <A HREF="table2-2.html">Table 2-2</A> shows
the 8-bit code table. It has twice as many columns as the 7-bit
table, because it contains 256 (versus 128) code values.</P>
<P>As with the 7-bit table, each row represents a possible value of
the four least significant bits of an 8-bit code (<A HREF="figure2-2.html">Figure 2-2</A>).
Each column represents a possible value of the four most
significant bits.</P>
<P>All codes on the left half of the 8-bit table (columns 0 through
7) are 7-bit compatible; their eighth bit is not set, and can be
ignored or assumed to be 0. You can use these codes in a 7-bit or
an 8-bit environment. All codes on the right half of the table
(columns 8 through 15) have their eighth bit set. You can use
these codes only in an 8-bit compatible environment.</P>
<P>The 8-bit code table (<A HREF="table2-2.html">Table 2-2</A>) has two sets of control
characters, C0 (control zero) and C1 (control one). The table also
has two sets of graphic characters, GL (graphic left) and GR
(graphic right).</P>
<P>On the VT220, the basic functions of the C0 and C1 codes are
defined by ANSI. C0 codes represent the ASCII control characters
described earlier. The C0 codes are 7-bit compatible. The C1 codes
represent 8-bit control characters that let you perform additional
functions beyond those possible with the C0 codes. You can only
use C1 codes directly in an 8-bit environment. Some C1 code
positions are blank, because their functions are not yet
standardized.</P>
<P CLASS="note">NOTE: The VT220 does not recognize all C0 and
C1 codes. <A HREF="chapter4.html">Chapter 4</A> identifies and
describes the codes it does recognize;
all others are simply ignored. (That is,
no action is taken).</P>
<P>The GL and GR sets of codes are reserved for graphic characters.
There are 94 GL codes in positions 2/1 through 7/14, and 94 GR
codes in positions 10/1 through 15/14. By ANSI standards,
positions 10/0 and 15/15 are not used. You can use GL codes in
7-bit or 8-bit environments. You can use GR codes only in an 8-bit
environment.</P>
<H2 ID="S2.4">2.4 Character Sets</H2>
<P>You cannot change the functions of the C0 or C1 codes. However,
you can map different sets of graphic characters into the GL
and/or GR codes. The sets are stored in the terminal as a graphic
repertoire. But you cannot use these graphics character sets until
you map them into the GL or GR codes. <A HREF="chapter4.html">Chapter 4</A> describes the
commands for mapping graphic character sets into GL or GR.</P>
<P>The terminal's graphic repertoire consists of the following
character sets, described in the following sections.</P>
<UL>
<LI>DEC multinational (consists of the ASCII graphics set and
DEC supplemental graphics set)</LI>
<LI>DEC special graphics</LI>
<LI>National replacement character (NRC) sets</LI>
<LI>Down-line-loadable</LI>
</UL>
<H3 ID="S2.4.1">2.4.1 DEC Multinational Character Set</H3>
<P>By factory default, when you power up or reset the terminal, the
DEC multinational character set (<A HREF="table2-3a.html">Table 2-3</A>) is mapped into the
8-bit code matrix (columns 0 through 15).</P>
<P>The 7-bit compatible left half of the DEC multinational set is the
ASCII graphics set. The C0 codes are the ASCII control characters,
and the GL codes are the ASCII graphics set.</P>
<P>The 8-bit compatible right half of the DEC multinational set
includes the C1 8-bit control characters in columns 8 and 9. The
GR codes are the DEC supplemental graphics set. The DEC
supplemental graphics set has alphabetic characters with accents
and diacritical marks that appear in the major Western European
alphabets. It also has other symbols not included in the ASCII
graphics set.</P>
<P>The terminal can work with over a dozen national (Western
European) keyboards. All keyboards assume the default DEC
multinational character set mapping. The code descriptions in the
rest of this manual also assume this mapping. Various characters
from the DEC supplemental graphics set appear as standard
(printing character) keys on different keyboards.</P>
<P>The DEC supplemental graphics character set is not available in
VT52 and VT100 modes.</P>
<H3 ID="S2.4.2">2.4.2 DEC Special Graphics Character Set</H3>
<P>The terminal's graphic repertoire includes the DEC special
graphics set (also known as the VT100 line drawing character set).
This character set (<A HREF="table2-4.html">Table 2-4</A>) has about two-thirds of the ASCII
graphic characters. It also has special symbols and short line
segments. The line segments let you create a limited range of
pictures while still using text mode.</P>
<P>Commands described in <A HREF="chapter4.html">Chapter 4</A> let you map the DEC special
graphics set into either GL or GR, replacing either the ASCII
graphics set or the DEC supplemental graphics set. Digital
recommends that you switch between ASCII and DEC special graphics
in GL, because the latter has most of the ASCII graphic
characters. Also, this mapping is compatible with a VT100
terminal.</P>
<H3 ID="S2.4.3">2.4.3 National Replacement Character Sets (NRC sets)</H3>
<P>The terminal's graphic character repertoire includes national
replacement character sets (Tables <A HREF="table2-5.html">2-5</A> through <A HREF="table2-15.html">2-15</A>). These sets
are available when you select national mode. Only one national
character set is available for use at any one time. THe NRC set
used depends on the keyboard selection in set-up as outlined
below.</P>
<TABLE>
<COL SPAN="2" ALIGN="LEFT">
<THEAD>
<TR>
<TH>Keyboard Selection</TH>
<TH>NRC Set</TH>
</TR>
</THEAD>
<TBODY>
<TR>
<TD>British</TD>
<TD><A HREF="table2-5.html">British</A></TD>
</TR>
<TR>
<TD>Danish</TD>
<TD><A HREF="table2-12.html">Norwegian/Danish</A></TD>
</TR>
<TR>
<TD>Dutch</TD>
<TD><A HREF="table2-6.html">Dutch</A></TD>
</TR>
<TR>
<TD>Finnish</TD>
<TD><A HREF="table2-7.html">Finnish</A></TD>
</TR>
<TR>
<TD>Flemish</TD>
<TD><A HREF="table2-8.html">French</A></TD>
</TR>
<TR>
<TD>French/Belgian</TD>
<TD><A HREF="table2-8.html">French</A></TD>
</TR>
<TR>
<TD>French Canadian</TD>
<TD><A HREF="table2-9.html">French Canadian</A></TD>
</TR>
<TR>
<TD>German</TD>
<TD><A HREF="table2-10.html">German</A></TD>
</TR>
<TR>
<TD>Italian</TD>
<TD><A HREF="table2-11.html">Italian</A></TD>
</TR>
<TR>
<TD>Norwegian</TD>
<TD><A HREF="table2-12.html">Norwegian/Danish</A></TD>
</TR>
<TR>
<TD>Spanish</TD>
<TD><A HREF="table2-13.html">Spanish</A></TD>
</TR>
<TR>
<TD>Swedish</TD>
<TD><A HREF="table2-14.html">Swedish</A></TD>
</TR>
<TR>
<TD>Swiss (French)</TD>
<TD><A HREF="table2-15.html">Swiss</A></TD>
</TR>
<TR>
<TD>Swiss (German)</TD>
<TD>Swiss</TD>
</TR>
</TBODY>
</TABLE>
<H3 ID="S2.4.4">2.4.4 Down-Line-Loadable Character Set</H3>
<P>The terminal provides for a 94-character down-line-loadable
graphic character set. You can define this character set and map
it into either GL or GR, as described in <A HREF="chapter4.html">Chapter 4</A>. This feature
is available only in VT200 mode.</P>
<H2 ID="S2.5">2.5 Control Functions</H2>
<P>You use control functions in your program to specify how the
terminal should handle data. There are many uses for control
functions. Here are some examples.</P>
<UL>
<LI>Move the cursor on the display.</LI>
<LI>Delete a line of text from the display.</LI>
<LI>Change character and line attributes.</LI>
<LI>Change graphic character sets.</LI>
<LI>Set the terminal operating mode.</LI>
</UL>
<P>You can use all control functions in text mode and express them as
single-byte or multibyte codes.</P>
<P>The single-byte codes are the C0 and C1 control characters. Your
program can perform a limited number of functions using C0
characters. C1 characters give you a few more functions, but your
program can use them directly only in an 8-bit environment.</P>
<P>Multibyte control codes represent far more functions, because they
have more possible code combinations. These codes are called
<STRONG>escape sequences</STRONG>, <STRONG>control sequences</STRONG>, and <STRONG>device control strings</STRONG>.
Some sequences are ANSI standardized and used throughout the
industry; others are private sequences created by manufacturers
like Digital for specific families of products. Private sequences,
like ANSI standardized sequences, follow ANSI standards for
character code composition.</P>
<H3 ID="S2.5.1">2.5.1 Escape Sequences</H3>
<P>An escape sequence starts with the C0 character ESC (1/11),
followed by one or more ASCII graphic characters. For example,</P>
<PRE>1/11 2/3 3/6
ESC   #   6</PRE>
<P>is an escape sequence that changes the current line of text to
double-width characters.</P>
<P>Because escape sequences use only 7-bit characters, you can use
them in 7-bit or 8-bit environments.</P>
<P CLASS="note">NOTE: When using escape or control sequences,
remember that it is the code that
defines a sequence -- not the graphic
representation of the characters. The
characters are shown for readability
only and presume the DEC multinational
character set mapping (ASCII graphics
set in GL and DEC supplemental graphics
set in GR).</P>
<P>One important use of escape sequences is extending the capability
of 7-bit control functions. ANSI lets you use 2-byte escape
sequences as 7-bit code extensions to express each of the C1
control codes. This is a valuable feature when your application
must be compatible with a 7-bit environment. For example, the C1
characters CSI, SS3, and IND can be expressed as follows.</P>
<PRE>                    <B>7-Bit Code Extension Equivalent</B>
<B>C1 Character</B>         <B>(Escape Sequence)</B>

9/11                1/11  5/11
CSI                 ESC    [

8/15                1/11  4/15
SS3                 ESC    O

8/4                 1/11  4/4
IND                 ESC    D</PRE>
<P>In general, you can use the above code extension technique in two
ways.</P>
<OL>
<LI>You can express any C1 control character as a 2-character
escape sequence whose second character has a code that is
40 (hexadecimal) and 64 (decimal) less than that of the
C1 character.</LI>
<LI>You can make any escape sequence whose second character
is in the range of 4/0 through 5/15 one byte shorter by
removing the ESC and adding 40 (hexadecimal) to the code
of the second character. This generates an 8-bit control
character.</LI>
</OL>
<H3 ID="S2.5.2">2.5.2 Control Sequences</H3>
<P>A control sequence starts with CSI (9/11), followed by one or more
ASCII graphic characters. But CSI (9/11) can also be expressed as
the 7-bit code extension ESC&nbsp;[ (1/11, 5/11). So you can express
all control sequences as escape sequences whose second character
code is [ (5/11). For example, the following two sequences are
equivalent sequences that perform the same function. (They cause
the display to use 132 columns per line rather than 80).</P>
<PRE>9/11 3/15 3/3 6/8
CSI   ?    3    h

1/11 5/11 3/15 3/3 6/8
ESC   [    ?    3   h</PRE>
<P>Whenever possible, you should use CSI instead of ESC&nbsp;[ to
introduce a control sequence. CSI uses one less byte than ESC&nbsp;[,
so you gain processing speed. However, you can only use a sequence
starting with CSI in an 8-bit environment (because CSI is a C1
control character).</P>
<H3 ID="S2.5.3">2.5.3 Device Control Strings</H3>
<P>A device control string (DCS) is a delimited string of characters
used in a data stream as a logical entity for control purposes. It
consists of an opening delimiter (a device control string
introducer), a command string (data), and a closing delimiter (a
string terminator).</P>
<P>You use device control strings to down-line-load character sets
and definitions for user-defined keys.</P>
<P>The VT220 uses the following device control string format.</P>
<PRE>9/0            ..........        9/12
DCS            Data              ST

Device                           String
Control         .UDK             Terminator
String          .Character Set   (closing delimiter)
(opening
delimiter)</PRE>
<P>DCS is an 8-bit control character. You can also express it as ESC
P (1/11, 5/0) when coding for a 7-bit environment.</P>
<P>ST is an 8-bit control character. You can also express it as ESC \
(1/11, 5/12) when coding for a 7-bit environment.</P>
<H2 ID="S2.6">2.6 Working with 7-Bit and 8-Bit Environments</H2>
<P>There are two requirements for using the terminal's 8-bit
character set. Your program and communication environment must be
8-bit compatible, and the terminal must operate in a VT200 mode.
When operating in VT100 or VT52 mode, you are limited to working
in a 7-bit environment. The following sections describe
conventions that apply in VT200 mode.</P>
<H3 ID="S2.6.1">2.6.1 Conventions for Codes Received by the Terminal</H3>
<P>The terminal expects to receive character codes in a form
consistent with 8-bit coding. Your application can use the C0 and
C1 control codes, as well as the 7-bit C1 code extensions, if
necessary. The terminal always interprets these codes correctly.
<A HREF="chapter4.html">Chapter 4</A> describes all code extensions you may need to use, and
their equivalent C1 control codes.</P>
<P>When your program sends GL or GR codes, the terminal interprets
the codes according to the graphic character mapping currently in
use. The factory-default mapping (which is set when you power up
or reset the terminal) is the DEC multinational character set.
This mapping assumes the current terminal mode is one of the VT200
modes.</P>
<H3 ID="S2.6.2">2.6.2 Conventions for Codes Sent by the Terminal</H3>
<P>Codes sent by the terminal to a program can come directly from the
keyboard or in response to commands issued from the host
(application program or operating system). In a VT200 mode, the
terminal always sends all GL and GR graphic codes to the
application exactly as they are generated, regardless of whether
the application handles 8-bit codes correctly or not. If, however,
a 7-bit communications line is used, C1 controls are sent as
escape sequences and the terminal does not allow the generation of
8-bit graphic codes.</P>
<P>Most function keys on the keyboard generate multibyte control
codes. Many of these codes start with either CSI (9/11) or SS3
(8/15), which are C1 characters. If your application environment
cannot handle 8-bit codes, you can make the terminal automatically
convert all C1 codes to their equivalent 7-bit code extensions
before sending them to the application. To convert C1 codes, you
use the DECSCL commands described in <A HREF="chapter4.html">Chapter 4</A>.</P>
<P>By default, the terminal is set to automatically convert all C1
codes sent to the application to 7-bit code extensions. However,
to ensure the correct mode of operation, always use the
appropriate DECSCL commands described in <A HREF="chapter4.html">Chapter 4</A>.</P>
<P CLASS="note">NOTE: New programs should accept both 7-bit
and 8-bit forms of the C1 controls.</P>
<H2 ID="S2.7">2.7 Display Controls Mode</H2>
<P>The terminal has a display controls mode that lets you display
control codes as graphic characters for debugging purposes (rather
than executing them). You can select this mode by changing the
"Interpret/Display Controls" field in the Set-Up Display screen.
You cannot use an escape sequence or invoke this mode from the
host computer.</P>
<P>When the terminal is in a VT200 mode, selecting the set-up
"Display Controls" field temporarily loads C0, GL, C1, and GR as
shown in <A HREF="table2-16a.html">Table 2-16</A>. All characters are displayed in the font
shown for C0, GL, C1, and GR.</P>
<P>When the terminal is in a VT52 or VT100 mode, selecting the set-up
"Display Controls" field temporarily loads C0 and GL as shown in
<A HREF="table2-16a.html">Table 2-16</A>. All characters are displayed in the font shown for C0
and GL. (C1 and GR are meaningless in VT52 or VT100 modes).</P>
<P>When you select the "Display Controls" field, the terminal
displays all control functions and prevents most from executing.
There are only two exceptions. LF, FF, and VT cause a new line (CR
LF), and XOFF (DC3) and XON (DC1) maintain flow control if
enabled. LF, FF, and VT are displayed before CRLF is executed, and
DC1 and DC3 are displayed after execution.</P>
<DIV CLASS="navbar"><HR>
<TABLE WIDTH="100%">
<COL SPAN="3" WIDTH="33%">
<TBODY>
<TR>
<TD ALIGN="LEFT"><A HREF="chapter1.html">Chapter 1</A></TD>
<TD ALIGN="CENTER"><A HREF="contents.html">Contents</A></TD>
<TD ALIGN="RIGHT"><A HREF="chapter3.html">Chapter 3</A></TD>
</TR>
</TBODY>
</TABLE>
<DIV CLASS="navbot">http://vt100.net/docs/vt220-rm/chapter2.html</DIV></DIV>
</BODY>
</HTML>
